Zero-day Malware Detection based on Supervised Learning Algorithms of API call Signatures

نویسندگان

Mamoun Alazab

Sitalakshmi Venkatraman

Paul A. Watters

Moutaz Alazab

چکیده

Zero-day or unknown malware are created using code obfuscation techniques that can modify the parent code to produce offspring copies which have the same functionality but with different signatures. Current techniques reported in literature lack the capability of detecting zero-day malware with the required accuracy and efficiency. In this paper, we have proposed and evaluated a novel method of employing several data mining techniques to detect and classify zero-day malware with high levels of accuracy and efficiency based on the frequency of Windows API calls. This paper describes the methodology employed for the collection of large data sets to train the classifiers, and analyses the performance results of the various data mining algorithms adopted for the study using a fully automated tool developed in this research to conduct the various experimental investigations and evaluation. Through the performance results of these algorithms from our experimental analysis, we are able to evaluate and discuss the advantages of one data mining algorithm over the other for accurately detecting zero-day malware successfully. The data mining framework employed in this research learns through analysing the behavior of existing malicious and benign codes in large datasets. We have employed robust classifiers, namely Naïve Bayes (NB) Algorithm, k−Nearest Neighbor (kNN) Algorithm, Sequential Minimal Optimization (SMO) Algorithm with 4 differents kernels (SMO Normalized PolyKernel, SMO – PolyKernel, SMO – Puk, and SMORadial Basis Function (RBF)), Backpropagation Neural Networks Algorithm, and J48 decision tree and have evaluated their performance. _____________________________ Copyright © 2011, Australian Computer Society, Inc. This paper appeared at the 9th Australasian Data Mining Conference (AusDM 2011), Ballarat, Australia. Conferences in Research and Practice in Information Technology (CRPIT), Vol. 121, Peter Vamplew, Andrew Stranieri, Kok-Leong Ong, Peter Christen and Paul Kennedy, Eds. Reproduction for academic, not-for profit purposes permitted provided this text is included Overall, the automated data mining system implemented for this study has achieved high true positive (TP) rate of more than 98.5%, and low false positive (FP) rate of less than 0.025, which has not been achieved in literature so far. This is much higher than the required commercial acceptance level indicating that our novel technique is a major leap forward in detecting zero-day malware. This paper also offers future directions for researchers in exploring different aspects of obfuscations that are affecting the IT world today.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DyVSoR: dynamic malware detection based on extracting patterns from value sets of registers

To control the exponential growth of malware files, security analysts pursue dynamic approaches that automatically identify and analyze malicious software samples. Obfuscation and polymorphism employed by malwares make it difficult for signature-based systems to detect sophisticated malware files. The dynamic analysis or run-time behavior provides a better technique to identify the threat. In t...

متن کامل

DroidCat: Unified Dynamic Detection of Android Malware

Various dynamic approaches have been developed to detect or categorize Android malware. These approaches execute software, collect call traces, and then detect abnormal system calls or sensitive API usage. Consequently, attackers can evade these approaches by intentionally obfuscating those calls under focus. Additionally, existing approaches treat detection and categorization of malware as sep...

متن کامل

Artificial Immune Clonal Selection Classification Algorithms for Classifying Malware and Benign Processes Using API Call Sequences

Machine learning is an important field of artificial intelligence in which models are generated by extracting rules and functions from large datasets. Machine learning includes a diversity of methods and algorithms such as decision trees, lazy learning, knearest neighbors, Bayesian methods, Gaussian processes, artificial neural networks, support vector machines, kernel algorithms, and artificia...

متن کامل

Automated Synthesis of Semantic Malware Signatures using Maximum Satisfiability

This paper proposes a technique for automatically learning semantic malware signatures for Android from very few samples of a malware family. The key idea underlying our technique is to look for a maximally suspicious common subgraph (MSCS) that is shared between all known instances of a malware family. An MSCS describes the shared functionality between multiple Android applications in terms of...

متن کامل

A multi-task learning model for malware classification with useful file access pattern from API call sequence

Based on API call sequences, semantic-aware and machine learning (ML) based malware classifiers can be built for malware detection or classification. Previous works concentrate on crafting and extracting various features from malware binaries, disassembled binaries or API calls via static or dynamic analysis and resorting to ML to build classifiers. However, they tend to involve too much featur...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Zero-day Malware Detection based on Supervised Learning Algorithms of API call Signatures

نویسندگان

چکیده

منابع مشابه

DyVSoR: dynamic malware detection based on extracting patterns from value sets of registers

DroidCat: Unified Dynamic Detection of Android Malware

Artificial Immune Clonal Selection Classification Algorithms for Classifying Malware and Benign Processes Using API Call Sequences

Automated Synthesis of Semantic Malware Signatures using Maximum Satisfiability

A multi-task learning model for malware classification with useful file access pattern from API call sequence

عنوان ژورنال:

اشتراک گذاری